127 research outputs found
Recognizing Objects In-the-wild: Where Do We Stand?
The ability to recognize objects is an essential skill for a robotic system
acting in human-populated environments. Despite decades of effort from the
robotic and vision research communities, robots are still missing good visual
perceptual systems, preventing the use of autonomous agents for real-world
applications. The progress is slowed down by the lack of a testbed able to
accurately represent the world perceived by the robot in-the-wild. In order to
fill this gap, we introduce a large-scale, multi-view object dataset collected
with an RGB-D camera mounted on a mobile robot. The dataset embeds the
challenges faced by a robot in a real-life application and provides a useful
tool for validating object recognition algorithms. Besides describing the
characteristics of the dataset, the paper evaluates the performance of a
collection of well-established deep convolutional networks on the new dataset
and analyzes the transferability of deep representations from Web images to
robotic data. Despite the promising results obtained with such representations,
the experiments demonstrate that object classification with real-life robotic
data is far from being solved. Finally, we provide a comparative study to
analyze and highlight the open challenges in robot vision, explaining the
discrepancies in the performance
Abstraction, ontology and task-guidance for visual perception in robots
For solving recognition tasks in order to navigate in unknown environments
and to manipulate objects, humans seem to use at least the following crucial
capabilities: abstraction (for storing higher-level concepts of things), common
sense knowledge and prediction. Whereas the first and second provide the
basis for situated recognition, the second and third serve for pruning the
search space as it helps anticipating what (in an abstract sense) they will see
next and where. The main goal of our current research is, how we could use
such a kind of "common sense world knowledge" for guiding visual perception
and understanding scenes. Therefore, we are combining an owl-ontology
with the output of vision tools. The additional use of abstraction techniques
tries to establish the possibility of detecting higher level concepts, such as
arches composed of a variable number of parts. The goal is to finally find
concepts such as doors and tables in arbitrary scenes in order to arrive at a
generic recognition tool for home robots. The ontology should additionally
provide task-specific information about the things to detect
- …